Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graph Library #1521

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Conversation

Marsella8
Copy link
Contributor

@Marsella8 Marsella8 commented Oct 10, 2024

Description of changes:

Extending testing functionality for graph library
Bug fixing
General clean up

Related Issues:

Linked Issues:

  • Issue #

Issues closed by this PR:

  • Closes #

This change is Reviewable

@Marsella8 Marsella8 requested a review from lockshaw October 10, 2024 04:25
Copy link

codecov bot commented Oct 10, 2024

Codecov Report

Attention: Patch coverage is 89.47368% with 72 lines in your changes missing coverage. Please review.

Project coverage is 79.05%. Comparing base (1d5140d) to head (8564b8b).

Files with missing lines Patch % Lines
.../undirected/algorithms/get_connected_components.cc 0.00% 41 Missing ⚠️
...h/series_parallel/series_parallel_decomposition.cc 66.66% 12 Missing ⚠️
.../utils/graph/instances/hashmap_undirected_graph.cc 37.50% 10 Missing ⚠️
lib/utils/src/utils/graph/algorithms.cc 0.00% 3 Missing ⚠️
.../graph/instances/unordered_set_undirected_graph.cc 0.00% 2 Missing ⚠️
...rc/utils/graph/undirected/undirected_edge_query.cc 0.00% 2 Missing ⚠️
...aph/undirected/algorithms/get_neighboring_nodes.cc 0.00% 1 Missing ⚠️
...tils/src/utils/graph/undirected/undirected_edge.cc 0.00% 1 Missing ⚠️
Additional details and impacted files
@@                Coverage Diff                @@
##           repo-refactor    #1521      +/-   ##
=================================================
+ Coverage          78.16%   79.05%   +0.89%     
=================================================
  Files                860      873      +13     
  Lines              27994    28703     +709     
  Branches             770      772       +2     
=================================================
+ Hits               21881    22691     +810     
+ Misses              6113     6012     -101     
Flag Coverage Δ
unittests 79.05% <89.47%> (+0.89%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
lib/utils/include/utils/containers/find.h 100.00% <100.00%> (ø)
...nclude/utils/graph/dataflow_graph/dataflow_graph.h 44.44% <ø> (-5.56%) ⬇️
lib/utils/include/utils/graph/digraph/digraph.h 100.00% <ø> (ø)
...b/utils/include/utils/graph/digraph/digraph_view.h 100.00% <ø> (ø)
...ls/include/utils/graph/multidigraph/multidigraph.h 100.00% <ø> (ø)
lib/utils/include/utils/graph/node/graph_view.h 100.00% <ø> (ø)
.../include/utils/graph/undirected/undirected_graph.h 100.00% <ø> (ø)
...ude/utils/graph/undirected/undirected_graph_view.h 100.00% <ø> (+20.00%) ⬆️
...ries_parallel/get_series_parallel_decomposition.cc 98.14% <100.00%> (-1.86%) ⬇️
.../utils/graph/series_parallel/parallel_reduction.cc 100.00% <100.00%> (ø)
... and 21 more

... and 123 files with indirect coverage changes

@Marsella8 Marsella8 marked this pull request as ready for review October 18, 2024 22:02
@lockshaw lockshaw changed the base branch from repo-refactor to master December 16, 2024 08:37
Copy link
Contributor Author

@Marsella8 Marsella8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 41 files reviewed, 1 unresolved discussion (waiting on @lockshaw)


lib/utils/src/utils/graph/series_parallel/series_parallel_decomposition.cc line 82 at r2 (raw file):

}

bool is_empty(Node const &node) {

borrowed from #1562, will fix them there if needed

Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 23 of 32 files at r1, 12 of 16 files at r2, 7 of 9 files at r3, all commit messages.
Reviewable status: 42 of 47 files reviewed, 35 unresolved discussions (waiting on @Marsella8)


lib/utils/test/src/utils/graph/undirected/algorithms/get_connected_components.cc line 10 at r3 (raw file):

using namespace FlexFlow;

TEST_CASE("get_connected_components") {

Suggestion:

TEST_SUITE(FF_TEST_SUITE) {
  TEST_CASE("get_connected_components") {

lib/utils/test/src/utils/graph/undirected/algorithms/get_connected_components.cc line 41 at r3 (raw file):

  }

  SUBCASE("3 components") {

Also test the case where everything is one component

Also test the case where the graph is empty


lib/utils/test/src/utils/graph/digraph/algorithms/get_dominators.cc line 34 at r3 (raw file):

      std::unordered_set<Node> correct = {n.at(0)};
      CHECK(correct == result);
    }

Suggestion:

    SUBCASE("small acyclic graph") {
      std::vector<Node> n = add_nodes(g, 4);
      std::vector<DirectedEdge> e = {
          DirectedEdge{n.at(0), n.at(3)},
          DirectedEdge{n.at(0), n.at(1)},
          DirectedEdge{n.at(0), n.at(2)},
          DirectedEdge{n.at(1), n.at(2)},
      };
      add_edges(g, e);

      SUBCASE("get_dominators(..., Node const &)") {
        Node node = n.at(2);
        std::unordered_set<Node> correct = {n.at(0), n.at(2)};
        std::unordered_set<Node> result = get_dominators(g, node);
        CHECK(correct == result);
      }
  
      SUBCASE("get_dominators(..., std::unordered_set<Node> const &)") {
        std::unordered_set<Node> nodes = {n.at(1), n.at(3)};
        std::unordered_set<Node> result = get_dominators(g, nodes);
        std::unordered_set<Node> correct = {n.at(0)};
        CHECK(correct == result);
      }
    }

lib/utils/include/utils/graph/series_parallel/series_parallel_decomposition.h line 20 at r3 (raw file):

std::unordered_multiset<Node> get_nodes(Node const &);

bool is_empty(Node const &node);

How would you end up with an empty SP decomposition? That doesn't seem like it should be allowed (i.e., might be a good idea to change the container in the splits to be some wrapper type of std::vector that requires itself to be non-empty)


lib/utils/include/utils/graph/series_parallel/series_parallel_decomposition.h line 28 at r3 (raw file):

SeriesParallelDecomposition delete_node(SeriesParallelDecomposition sp,
                                        Node const &node);

How is this a well-behaved operation? Like, what are the semantics of this? Are you sure we want this function as part of the API for SeriesParallelDecomposition?

Code quote:

SeriesParallelDecomposition delete_node(SeriesParallelDecomposition sp,
                                        Node const &node);

lib/utils/include/utils/graph/series_parallel/series_parallel_decomposition.h line 30 at r3 (raw file):

                                        Node const &node);

// duplicate nodes within `sp` are counted multiple times

Doxygen style please 🙂

Code quote:

// duplicate nodes within `sp` are counted multiple times

lib/utils/include/utils/graph/series_parallel/series_parallel_decomposition.h line 33 at r3 (raw file):

size_t num_nodes(SeriesParallelDecomposition const &sp);

SeriesParallelDecomposition serial_composition(

For consistency, never use "serial" in the FF codebase, always use "series"

Suggestion:

SeriesParallelDecomposition series_composition(

lib/utils/include/utils/graph/series_parallel/series_reduction.h line 19 at r3 (raw file):

std::unordered_set<std::vector<MultiDiEdge>>
    find_all_extended_series_reductions(MultiDiGraphView const &g);

What is an "extended series reduction"? Correct me if I'm wrong here but I don't think this is standard terminology(?) in which case it would be good to have a doxygen docstring here with some explanation

Also, it might be nice to create an actual datatype (e.g., ExtendedSeriesReduction) that is a wrapper type around std::vector<MultiDiEdge> rather than just using a raw vector.

Code quote:

    find_all_extended_series_reductions(MultiDiGraphView const &g);

lib/utils/test/src/utils/graph/undirected/undirected.cc line 15 at r3 (raw file):

TEST_SUITE(FF_TEST_SUITE) {
  TEST_CASE_TEMPLATE(

Why are there two tests here?


lib/utils/test/src/utils/graph/undirected/undirected.cc line 18 at r3 (raw file):

      "UndirectedGraph implementations", T, HashmapUndirectedGraph) {

    RC_SUBCASE("Full", [&]() {

Why is this using rapidcheck?


lib/utils/src/utils/graph/views/views.cc line 2 at r3 (raw file):

#include "utils/graph/views/views.h"
#include "utils/bidict/algorithms/right_entries.h"

I'm not seeing `right_entries used anywhere here in new code. Was this already necessary and just missing, or can this be removed?


lib/utils/src/utils/graph/views/views.cc line 70 at r3 (raw file):

UndirectedEdge to_undirected_edge(DirectedEdge const &e) {
  return UndirectedEdge{{e.src, e.dst}};

Does commutative_pair not have an explicit constructor?

Code quote:

{e.src, e.dst}

lib/utils/src/utils/graph/views/views.cc line 122 at r3 (raw file):

  std::unordered_set<UndirectedEdge> undirected_edges =
      set_union(g.query_edges(UndirectedEdgeQuery{q.srcs}),
                g.query_edges(UndirectedEdgeQuery{q.dsts}));

Can't you just union the query_sets (i.e., q.srcs and q.dsts) and then do the query, rather than unioning the query results?

Code quote:

  std::unordered_set<UndirectedEdge> undirected_edges =
      set_union(g.query_edges(UndirectedEdgeQuery{q.srcs}),
                g.query_edges(UndirectedEdgeQuery{q.dsts}));

lib/utils/test/src/utils/graph/digraph/algorithms/traversal.cc line 12 at r3 (raw file):

TEST_SUITE(FF_TEST_SUITE) {
  TEST_CASE("get_unchecked_dfs_ordering") {

IIRC the semantics of unchecked_dfs_ordering are essentially that it is free to visit nodes multiple times (i.e., in a diamond graph of 4 nodes (a -> b, a -> c, b -> d, c -> d), it would either yield [a, b, d, c, d] or [a, c, d, b, d]--it would be good to add a testcase for this


lib/utils/test/src/utils/graph/digraph/algorithms/traversal.cc line 24 at r3 (raw file):

      std::vector<Node> result = get_unchecked_dfs_ordering(g, {n[0]});
      CHECK(correct == result);
    }

Suggestion:

    SUBCASE("simple path") {
      std::vector<Node> correct = {n[0], n[1], n[2], n[3]};
      std::vector<Node> result = get_unchecked_dfs_ordering(g, {n[0]});
      CHECK(correct == result);
    }
    
    SUBCASE("start from non-initial node") {
      std::vector<Node> correct = {n[1], n[2], n[3]};
      std::vector<Node> result = get_unchecked_dfs_ordering(g, {n[1]});
      CHECK(correct == result);
    }

lib/utils/test/src/utils/graph/digraph/algorithms/algorithms.cc line 11 at r3 (raw file):

TEST_SUITE(FF_TEST_SUITE) {
  TEST_CASE("DiGraph - algorithms.cc") {

Prefer one TEST_CASE per function, rather than one SUBCASE per function


lib/utils/test/src/utils/graph/digraph/algorithms/algorithms.cc line 16 at r3 (raw file):

    std::vector<Node> n = add_nodes(g, 4);
    std::vector<DirectedEdge> e = {
        DirectedEdge{n[0], n[1]},

Prefer .at for bounds checking

Suggestion:

        DirectedEdge{n.at(0), n.at(1)},

lib/utils/test/src/utils/graph/digraph/algorithms/algorithms.cc line 55 at r3 (raw file):

    }

    SUBCASE("get_sinks") {

Potentially rename to get_terminal_nodes/get_initial_nodes following https://reviewable.io/reviews/flexflow/flexflow-train/1565#-OGCWYwtE_rcz9e1WizH


lib/utils/test/src/utils/containers/contains.cc line 19 at r3 (raw file):

      std::unordered_set<int> s = {1, 2, 3, 4, 5};
      CHECK(contains(s, 3));
      CHECK(!contains(s, 6));

Doesn't doctest require this to use CHECK_FALSE? iirc doctest has an issue with CHECK(!...)

Code quote:

      CHECK(!contains(s, 6));

lib/utils/test/src/utils/graph/views/views.cc line 49 at r3 (raw file):

      CHECK(result == expected);
    }
  }

Probably the best way to test these (due to the complexity of the querying interface) would be to use rapidcheck to generate a bunch of operations and then ensure the behaviors are the same between the view and the non-view graph that would be generated. This is going to be annoying to write, but without restructuring things this is probably the correct approach. The good thing is that the equivalence testing code only needs to be written once per graphview type and then can be reused for each of the views.

Code quote:

  TEST_CASE("UndirectedSubgraphView") {
    UndirectedGraph g = UndirectedGraph::create<HashmapUndirectedGraph>();
    std::vector<Node> n = add_nodes(g, 5);
    add_edges(g,
              {UndirectedEdge{{n.at(0), n.at(3)}},
               UndirectedEdge{{n.at(1), n.at(1)}},
               UndirectedEdge{{n.at(1), n.at(2)}},
               UndirectedEdge{{n.at(1), n.at(3)}},
               UndirectedEdge{{n.at(2), n.at(3)}},
               UndirectedEdge{{n.at(2), n.at(4)}}});
    std::unordered_set<Node> sub_nodes = {n.at(0), n.at(1), n.at(3)};
    UndirectedGraphView view = view_subgraph(g, sub_nodes);

    SUBCASE("get_nodes") {
      std::unordered_set<Node> expected = {n.at(0), n.at(1), n.at(3)};

      std::unordered_set<Node> result = get_nodes(view);

      CHECK(result == expected);
    }

    SUBCASE("get_edges") {
      std::unordered_set<UndirectedEdge> expected = {
          UndirectedEdge{{n.at(0), n.at(3)}},
          UndirectedEdge{{n.at(1), n.at(1)}},
          UndirectedEdge{{n.at(1), n.at(3)}},
      };

      std::unordered_set<UndirectedEdge> result = get_edges(view);

      CHECK(result == expected);
    }
  }

lib/utils/include/utils/graph/series_parallel/parallel_reduction.h line 15 at r3 (raw file):

    find_parallel_reduction(MultiDiGraphView const &);

std::unordered_map<DirectedEdge, std::unordered_set<MultiDiEdge>>

What is an "extended parallel reduction"? Correct me if I'm wrong here but I don't think this is standard terminology(?) in which case it would be good to have a doxygen docstring here with some explanation

Also, it might be nice to create an actual datatype (e.g., ExtendedParallelReduction) that is a wrapper type around std::unordered_set<MultiDiEdge> rather than just using a raw unordered set

If this is just for accelerating the SP decomposition algorithm's rewriting step, it might be better to put these functions (and the corresponding ones for "extended series reductions" in that file rather than in the general "parallel_reduction.h" header)


lib/utils/test/src/utils/graph/multidigraph/algorithms/get_outgoing_edges.cc line 35 at r3 (raw file):

    SUBCASE("node has no outgoing edges") {
      std::unordered_set<MultiDiEdge> result = get_outgoing_edges(g, n.at(2));

Would be nice for this node to have some incoming edges to ensure they're not accidentally picked up


lib/utils/test/src/utils/graph/multidigraph/algorithms/get_incoming_edges.cc line 31 at r3 (raw file):

    SUBCASE("node has no incoming edges") {
      std::unordered_set<MultiDiEdge> result = get_incoming_edges(g, n.at(2));

Would be nice for this node to have some outgoing edges to ensure they're not accidentally picked up


lib/utils/include/utils/graph/node/node.struct.toml line 9 at r3 (raw file):

  "fmt",
  "json",
  "rapidcheck",

How are you using a default Arbitrary instance for Node? I have no issue adding generators for Node, but the default instance seems unlikely to be helpful as Node is only meaningful in the context of a graph?


lib/utils/test/src/utils/graph/series_parallel/series_reduction.cc line 263 at r3 (raw file):

          find_all_extended_series_reductions(g);
      std::unordered_set<std::vector<MultiDiEdge>> correct = {
          {e[0], e[1], e[2]}};

Prefer .at for bounds checking

Suggestion:

          {e.at(0), e.at(1), e.at(2)}};

lib/utils/test/src/utils/graph/series_parallel/series_reduction.cc line 267 at r3 (raw file):

    }

    SUBCASE("2 linear strands") {

Suggestion:

    SUBCASE("2 linear strands that share a common terminal node") {

lib/utils/test/src/utils/graph/series_parallel/series_reduction.cc line 277 at r3 (raw file):

      std::unordered_set<std::vector<MultiDiEdge>> result =
          find_all_extended_series_reductions(g);
      std::unordered_set<std::vector<MultiDiEdge>> correct = {{e[0], e[2]},

Prefer .at for bounds checking


lib/utils/test/src/utils/graph/series_parallel/series_reduction.cc line 300 at r3 (raw file):

          find_all_extended_series_reductions(g);
      std::unordered_set<std::vector<MultiDiEdge>> correct = {
          {e[0], e[2], e[7]}, {e[3], e[6]}, {e[5], e[9]}};

Prefer .at for bounds checking


lib/utils/test/src/utils/graph/series_parallel/series_reduction.cc line 372 at r3 (raw file):

        std::unordered_set<MultiDiEdge> result_edges = get_edges(g);
        std::unordered_set<MultiDiEdge> correct_edges = [&] {
          std::unordered_set<MultiDiEdge> new_edges = unordered_set_of(e);

Why not just set_minus here?


lib/utils/include/utils/graph/multidigraph/algorithms/get_outgoing_edges.h line 12 at r3 (raw file):

std::unordered_map<Node, std::unordered_set<MultiDiEdge>>
    get_outgoing_edges(MultiDiGraphView const &g);

Suggestion:

std::unordered_map<Node, std::unordered_set<MultiDiEdge>>
    get_outgoing_edges_map(MultiDiGraphView const &g);

lib/utils/include/utils/graph/multidigraph/algorithms/get_incoming_edges.h line 12 at r3 (raw file):

std::unordered_map<Node, std::unordered_set<MultiDiEdge>>
    get_incoming_edges(MultiDiGraphView const &g);

Suggestion:

std::unordered_map<Node, std::unordered_set<MultiDiEdge>>
    get_incoming_edges_map(MultiDiGraphView const &g);

lib/utils/test/src/utils/graph/digraph/algorithms/directed_edge_query.cc line 27 at r3 (raw file):

      DirectedEdgeQuery result = directed_edge_query_all();

      CHECK(matches_edge(result, DirectedEdge{n.at(0), n.at(1)}));

Do you really need 5 checks here? It feels like the behavior of directed_edge_query_all could have been tested on a much smaller graph (or even without a graph at all, using rapidcheck). This feels like unnecessary reusing of g, which is why I recommend doing one TEST_CASE per function instead of one SUBCASE per function unless you have very good reason.

In fact, I don't see how g is necessary for any of the cases in this file?


lib/utils/test/src/utils/graph/digraph/algorithms/directed_edge_query.cc line 38 at r3 (raw file):

          DirectedEdgeQuery{query_set{n.at(0)}, query_set{n.at(1)}};

      CHECK(matches_edge(q, DirectedEdge{n.at(0), n.at(1)}));

Check that it doesn't match reversed edges

Code quote:

      CHECK(matches_edge(q, DirectedEdge{n.at(0), n.at(1)}));

lib/utils/test/src/utils/graph/digraph/algorithms/directed_edge_query.cc line 57 at r3 (raw file):

        CHECK(result == correct);
      }
      SUBCASE("intersection with std::nullopt") {

Suggestion:

      SUBCASE("intersection with matchall") {

lib/utils/src/utils/graph/series_parallel/series_reduction.cc line 79 at r3 (raw file):

  }
  return unordered_set_of(values(strands));
}

Newlines please--this is really hard to read

Code quote:

std::unordered_set<std::vector<MultiDiEdge>>
    find_all_extended_series_reductions(MultiDiGraphView const &g) {
  std::unordered_map<Node, std::unordered_set<MultiDiEdge>> incoming_edges =
      get_incoming_edges(g);
  std::unordered_map<Node, std::unordered_set<MultiDiEdge>> outgoing_edges =
      get_outgoing_edges(g);
  std::unordered_map<Node, std::vector<MultiDiEdge>> strands;
  std::unordered_map<Node, Node> node_to_head_of_strand;
  for (Node const &n : get_topological_ordering(g)) {
    if ((incoming_edges.at(n).size() == 1) &&
        (outgoing_edges.at(n).size() == 1)) {
      MultiDiEdge incoming = get_only(incoming_edges.at(n));
      MultiDiEdge outgoing = get_only(outgoing_edges.at(n));
      Node pre = g.get_multidiedge_src(incoming);
      if (contains_key(node_to_head_of_strand, pre)) {
        Node head = node_to_head_of_strand.at(pre);
        node_to_head_of_strand.emplace(n, head);
        strands.at(head).push_back(outgoing);
      } else {
        node_to_head_of_strand.emplace(n, n);
        strands[n].push_back(incoming);
        strands[n].push_back(outgoing);
      }
    }
  }
  return unordered_set_of(values(strands));
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants